Auto eval actually works #310

vwxyzjn · 2024-08-29T14:26:28Z

So auto eval that actually works (also with oe-eval). The idea is to do something like the following

10:30 - now testing with docker containers.

    start_time = time.time()
    while time.time() - start_time < args.max_wait_time_for_beaker_dataset_upload_seconds:
        if beaker_experiment_succeeded(beaker_runtime_config.beaker_workload_id):
            print("Experiment succeeded")
            # NOTE: we are assuming the first beaker dataset has the model
            # I have checked a couple of beaker jobs and found the first dataset is the model
            # but we should check this assumption
            submit_beaker_eval_jobs(
                model_name=args.model_name,
                location=beaker_dataset_ids[0],
                run_oe_eval_experiments=True,
            )
            return
        time.sleep(args.check_interval_seconds)
    # If we reach here, the experiment failed
    print("Experiment failed")
    sys.exit(1)  # submit eval failed

docker submission

actual job https://beaker.org/ex/01J6FHK8NCWYMCNBCN8QJN80Y3
wait for the actual job to upload models in beaker dataset, and then launch evals: https://beaker.org/ex/01J6FHWKKFB6VD39R63YSGEQJS
oe-eval: https://beaker.org/ex/01J6FJ2DET4MNYRWYEW75MABZ5
oi-eval: https://beaker.org/ex/01J6FJ2E9S1E5JS7P9CZCH2V2T

mason submission

actual job https://beaker.org/ex/01J6DV30NX8XZ6NQMKAFGDC2TJ
wait for the actual job to upload models in beaker dataset, and then launch evals: https://beaker.org/ex/01J6DV4FTTF03SFSF3ME60VS9A
oe-eval: https://beaker.org/ex/01J6DV6PKWT2N3SK2CXYD1HK3Z
oi-eval: https://beaker.org/ex/01J6DV6PSRX4Q02NTSKWTDD1RE

open_instruct/utils.py

vwxyzjn · 2024-08-29T14:29:51Z

scripts/wait_beaker_dataset_model_upload_then_evaluate_model.py

+            # NOTE: we are assuming the first beaker dataset has the model
+            # I have checked a couple of beaker jobs and found the first dataset is the model
+            # but we should check this assumption


@yizhongw @jacob-morrison for multi-node job I assumed the first beaker dataset will have the model. Is this always the case?

I checked my experiment list. It seems so. I didn't see an exception.

my one concern is if this is the same when preemption happens, but thats sort of an edge case so happy to merge and then patch later.

vwxyzjn · 2024-08-29T17:18:50Z

open_instruct/utils.py

@@ -704,7 +763,7 @@ def upload_metadata_to_hf(
    # about a model for leaderboard displays.
    with open("tmp.json", "w") as f:
        json.dump(metadata_dict, f)
-    api = HfApi(token=os.getenv("HF_TOKEN", None))
+    api = HfApi()


HF_TOKEN already overrides, so there is no need to pass explicitly

vwxyzjn · 2024-08-29T17:39:02Z

open_instruct/finetune.py

@@ -1001,49 +1001,65 @@ def main(args: FlatArguments):
        )

    # remove all checkpoints to save space
-    if accelerator.is_main_process:
+    if accelerator.is_local_main_process:


This should remove the intermedaite checkpoints on all the nodes.

Looks good!

natolambert

seems good

hamishivi · 2024-08-30T20:30:35Z

scripts/wait_beaker_dataset_model_upload_then_evaluate_model.py

+            # NOTE: we are assuming the first beaker dataset has the model
+            # I have checked a couple of beaker jobs and found the first dataset is the model
+            # but we should check this assumption


my one concern is if this is the same when preemption happens, but thats sort of an edge case so happy to merge and then patch later.

vwxyzjn added 5 commits August 28, 2024 21:14

Use beaker dataset to submit autoeval

b707314

push

99fd5bd

quick change

fbffa1b

Auto-eval-actually works

b74da4b

build docker container

887e2c1

vwxyzjn commented Aug 29, 2024

View reviewed changes

vwxyzjn added 4 commits August 29, 2024 14:57

Dockerfile update

ea019be

test

43c004e

quick push

ea0a22e

quick change

05a8f33

vwxyzjn force-pushed the auto-eval-actually-works branch from 6804242 to 05a8f33 Compare August 29, 2024 16:42

vwxyzjn added 4 commits August 29, 2024 16:43

dummy change

3270c2d

why does it not build

b02bdcb

Merge branch 'main' into auto-eval-actually-works

219e1d2

revert changes

45c8094

vwxyzjn marked this pull request as ready for review August 29, 2024 17:17

vwxyzjn commented Aug 29, 2024

View reviewed changes

vwxyzjn requested review from natolambert and hamishivi August 29, 2024 17:20

cleanup things on other nodes, too.

be2cded

vwxyzjn commented Aug 29, 2024

View reviewed changes

vwxyzjn mentioned this pull request Aug 29, 2024

Oe-eval ci test 5 #301

Closed

natolambert approved these changes Aug 30, 2024

View reviewed changes

Merge branch 'main' into auto-eval-actually-works

ff51529

vwxyzjn merged commit 4c0e9e8 into main Aug 30, 2024
3 checks passed

hamishivi approved these changes Aug 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto eval actually works #310

Auto eval actually works #310

vwxyzjn commented Aug 29, 2024 •

edited

Loading

vwxyzjn Aug 29, 2024

yizhongw Aug 29, 2024

hamishivi Aug 30, 2024

vwxyzjn Aug 29, 2024

vwxyzjn Aug 29, 2024

yizhongw Aug 29, 2024

natolambert left a comment

hamishivi Aug 30, 2024

Auto eval actually works #310

Auto eval actually works #310

Conversation

vwxyzjn commented Aug 29, 2024 • edited Loading

docker submission

mason submission

vwxyzjn Aug 29, 2024

Choose a reason for hiding this comment

yizhongw Aug 29, 2024

Choose a reason for hiding this comment

hamishivi Aug 30, 2024

Choose a reason for hiding this comment

vwxyzjn Aug 29, 2024

Choose a reason for hiding this comment

vwxyzjn Aug 29, 2024

Choose a reason for hiding this comment

yizhongw Aug 29, 2024

Choose a reason for hiding this comment

natolambert left a comment

Choose a reason for hiding this comment

hamishivi Aug 30, 2024

Choose a reason for hiding this comment

vwxyzjn commented Aug 29, 2024 •

edited

Loading